Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A hierarchical and scalable model for contemporary document image segmentation

Identifieur interne : 000209 ( Main/Exploration ); précédent : 000208; suivant : 000210

A hierarchical and scalable model for contemporary document image segmentation

Auteurs : Asma Ouji [France] ; Yann Leydier [France] ; Frank Lebourgeois [France]

Source :

RBID : Pascal:14-0075945

Descripteurs français

English descriptors

Abstract

In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization/ quantization without any penalizing information loss. This model may be used for many purposes. For instance, we rely on it to carry out the first steps leading to advertisement recognition in document images. Furthermore, the color segmentation output is used to localize text areas and enhance optical character recognition (OCR) performances. We held tests on a variety of magazine images to point up our contribution to the well-known OCR product Abby Finer-Reader. We also get promising results with our ad detection system on a large set of complex layout testing images.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">A hierarchical and scalable model for contemporary document image segmentation</title>
<author>
<name sortKey="Ouji, Asma" sort="Ouji, Asma" uniqKey="Ouji A" first="Asma" last="Ouji">Asma Ouji</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Leydier, Yann" sort="Leydier, Yann" uniqKey="Leydier Y" first="Yann" last="Leydier">Yann Leydier</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">14-0075945</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 14-0075945 INIST</idno>
<idno type="RBID">Pascal:14-0075945</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000026</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000738</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000052</idno>
<idno type="wicri:doubleKey">1433-7541:2013:Ouji A:a:hierarchical:and</idno>
<idno type="wicri:Area/Main/Merge">000212</idno>
<idno type="wicri:Area/Main/Curation">000209</idno>
<idno type="wicri:Area/Main/Exploration">000209</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">A hierarchical and scalable model for contemporary document image segmentation</title>
<author>
<name sortKey="Ouji, Asma" sort="Ouji, Asma" uniqKey="Ouji A" first="Asma" last="Ouji">Asma Ouji</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Leydier, Yann" sort="Leydier, Yann" uniqKey="Leydier Y" first="Yann" last="Leydier">Yann Leydier</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Pattern analysis and applications : (Print)</title>
<title level="j" type="abbreviated">Pattern anal. appl. : (Print)</title>
<idno type="ISSN">1433-7541</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Pattern analysis and applications : (Print)</title>
<title level="j" type="abbreviated">Pattern anal. appl. : (Print)</title>
<idno type="ISSN">1433-7541</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Adaptive method</term>
<term>Advertising</term>
<term>Character recognition</term>
<term>Color image</term>
<term>Computer vision</term>
<term>Digitizing</term>
<term>Document analysis</term>
<term>Document processing</term>
<term>Document structure</term>
<term>Hierarchical system</term>
<term>Image processing</term>
<term>Image recognition</term>
<term>Image segmentation</term>
<term>Information loss</term>
<term>Modeling</term>
<term>Noisy image</term>
<term>Optical character recognition</term>
<term>Robustness</term>
<term>Scalability</term>
<term>Signal quantization</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Traitement document</term>
<term>Traitement image</term>
<term>Image couleur</term>
<term>Numérisation</term>
<term>Extensibilité</term>
<term>Perte information</term>
<term>Analyse documentaire</term>
<term>Reconnaissance image</term>
<term>Vision ordinateur</term>
<term>Texte</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Structure document</term>
<term>Quantification signal</term>
<term>Publicité</term>
<term>Système hiérarchisé</term>
<term>Robustesse</term>
<term>Méthode adaptative</term>
<term>Modélisation</term>
<term>Image bruitée</term>
<term>.</term>
<term>Segmentation image</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Numérisation</term>
<term>Publicité</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization/ quantization without any penalizing information loss. This model may be used for many purposes. For instance, we rely on it to carry out the first steps leading to advertisement recognition in document images. Furthermore, the color segmentation output is used to localize text areas and enhance optical character recognition (OCR) performances. We held tests on a variety of magazine images to point up our contribution to the well-known OCR product Abby Finer-Reader. We also get promising results with our ad detection system on a large set of complex layout testing images.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Auvergne-Rhône-Alpes</li>
<li>Rhône-Alpes</li>
</region>
</list>
<tree>
<country name="France">
<region name="Auvergne-Rhône-Alpes">
<name sortKey="Ouji, Asma" sort="Ouji, Asma" uniqKey="Ouji A" first="Asma" last="Ouji">Asma Ouji</name>
</region>
<name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
<name sortKey="Leydier, Yann" sort="Leydier, Yann" uniqKey="Leydier Y" first="Yann" last="Leydier">Yann Leydier</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000209 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000209 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:14-0075945
   |texte=   A hierarchical and scalable model for contemporary document image segmentation
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024